Chao Yin is mainy responible for collection of team/player game stats data while Zeyu Yang is responsible for players’ biographical and salaries information.
Our data is collected from Basketball-Reference, Stats NBA and Kaggle.
Basketball Reference is a site providing both basic and sabermetric statistics and resources for basketball fans using offical NBA data.
Stats NBA is the home of NBA Advanced Stats and provides official NBA Statistics and advanced analytics.
Kaggle is an online community that allows users to find and publish data sets.
Data in Basketball-Reference is stored in XML so that we can directly extract them using packages XML and RCurl. However, some tables on this site are commented and they can only be downloaded manually in csv form thus we choose Stats NBA for other data. It’s a bit harder to extract data tables from Stats NBA than from Basketball-Reference since they are stored in json files. We use statsnbaR which provides utility functions to download data from the API end-points of Stats NBA. We got teams from Basketball -Reference and players from Stats NBA.
Kaggle is the source of player’s biographical data. The aforementioned two sites can also provide the same data but the data is harded to collect since it is not stored in tables.
players datasets contains all regular season information of all players in one season.
General data provides basic players’ performance including:
Profile information like Name, Team, Age, Game Played, Minutes Played, etc.
Shooting performance from 2 pointer, 3 pointer and free throw like Field Goalds Made, Field Goals Attempted, Field Goal Percentage, etc.
Basic stats per game like Rebounds, Assists, Steals, Blocks, Points, Turnovers, Personal Fouls, etc.
Advanced data measures and analysis player’s ability in one percific area :
Overall ratings like Offensive Rating, Defensive Rating, Net Rating, Player Impact Estimate, Usage Percentage, etc.
Passing/Assist ability like Assist Percentage, Assist to Turnover Ratio, Assist Ratio
Rebound ability like Offensive Rebound Percentage, Defensive Rebound Percentage, Rebound Percentage
Shooting ability like Effective Field Goal Percentage, True Shooting Percentage
Bio dataset contains players’ biographical data:
The year player starts playing at NBA and the year he retires
Height and weight data
Birth date
College attended
teams datasets contains similar information as shown in the players but corresponds to each team in the league. However, teams provides ways to split the data in order to measure the teams’ performance from different angles:
Location helps measure teams’ gaming performance at home or on the road respectively
Wins-Losses tells how the team played when they won or losed the game
Month and Pre/Post All Stars give teams’ performance changes over time periods
Days Rest tests teams abilities to handle tough schedules
Teams in NBA keep changing in these 15 years. Three teams change their team locations and team names thus we may find the teams are not necessarily the same each year. Players can be traded and signed during the season, which makes some players have more records than others in these datasets.
Also all data are saved as factor, which requires us to convert them to numeric or character.
After we got all the raw data in data/raw, we wanted to combine them into four datasets: Team_splits, Team_shoots, Player & Players_bio.
For the players’ data, we first remove empty rows and columns and turn the variables into numerics and characters according to their content. Considering more and more players can play more than one position today, we group the players into three kinds: Guards, Wings and Bigs instead of the origin positions they play. And finally we combind players data of all 15 years and got Player.
Scroll down the table to see more details
print(dfSummary(Player,headings = FALSE,plain.ascii = FALSE,valid.col = FALSE,graph.magnif = 0.75,style = "grid"),max.tbl.height = 500,method='render')
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Player [character] | 1. Corey Brewer 2. Kyle Korver 3. Andre Miller 4. Devin Harris 5. Mike James 6. Nazr Mohammed 7. Trevor Ariza 8. Drew Gooden 9. Pau Gasol 10. Shaun Livingston [ 1636 others ] |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 2 | Pos [factor] | 1. Guards 2. Wings 3. Bigs |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 3 | Age [numeric] | Mean (sd) : 26.7 (4.2) min < med < max: 18 < 26 < 44 IQR (CV) : 7 (0.2) | 26 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 4 | Tm [character] | 1. TOT 2. HOU 3. CLE 4. MEM 5. NYK 6. LAC 7. PHI 8. WAS 9. DAL 10. MIL [ 26 others ] |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 5 | G [numeric] | Mean (sd) : 46.8 (26.2) min < med < max: 1 < 51 < 85 IQR (CV) : 49 (0.6) | 85 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 6 | GS [numeric] | Mean (sd) : 22.1 (27.4) min < med < max: 0 < 7 < 83 IQR (CV) : 40 (1.2) | 84 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 7 | MP [numeric] | Mean (sd) : 19.6 (9.9) min < med < max: 0 < 19 < 43.1 IQR (CV) : 16 (0.5) | 410 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 8 | FG [numeric] | Mean (sd) : 2.9 (2.1) min < med < max: 0 < 2.4 < 12.2 IQR (CV) : 2.8 (0.7) | 111 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 9 | FGA [numeric] | Mean (sd) : 6.6 (4.5) min < med < max: 0 < 5.5 < 27.2 IQR (CV) : 6.1 (0.7) | 228 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 10 | FG% [numeric] | Mean (sd) : 0.4 (0.1) min < med < max: 0 < 0.4 < 1 IQR (CV) : 0.1 (0.2) | 458 distinct values | 52 (0.54%) | |||||||||||||||||||||||||||||||||||||||||||||
| 11 | 3P [numeric] | Mean (sd) : 0.6 (0.7) min < med < max: 0 < 0.3 < 5.1 IQR (CV) : 1 (1.2) | 44 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 12 | 3PA [numeric] | Mean (sd) : 1.7 (1.8) min < med < max: 0 < 1.1 < 13.2 IQR (CV) : 2.7 (1.1) | 95 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 13 | 3P% [numeric] | Mean (sd) : 0.3 (0.2) min < med < max: 0 < 0.3 < 1 IQR (CV) : 0.2 (0.6) | 380 distinct values | 1467 (15.35%) | |||||||||||||||||||||||||||||||||||||||||||||
| 14 | 2P [numeric] | Mean (sd) : 2.3 (1.8) min < med < max: 0 < 1.8 < 10.3 IQR (CV) : 2.3 (0.8) | 99 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 15 | 2PA [numeric] | Mean (sd) : 4.9 (3.6) min < med < max: 0 < 3.9 < 22.2 IQR (CV) : 4.7 (0.7) | 198 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 16 | 2P% [numeric] | Mean (sd) : 0.5 (0.1) min < med < max: 0 < 0.5 < 1 IQR (CV) : 0.1 (0.2) | 446 distinct values | 95 (0.99%) | |||||||||||||||||||||||||||||||||||||||||||||
| 17 | eFG% [numeric] | Mean (sd) : 0.5 (0.1) min < med < max: 0 < 0.5 < 1.5 IQR (CV) : 0.1 (0.2) | 473 distinct values | 52 (0.54%) | |||||||||||||||||||||||||||||||||||||||||||||
| 18 | FT [numeric] | Mean (sd) : 1.4 (1.4) min < med < max: 0 < 1 < 10.3 IQR (CV) : 1.4 (1) | 92 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 19 | FTA [numeric] | Mean (sd) : 1.9 (1.7) min < med < max: 0 < 1.4 < 11.7 IQR (CV) : 1.8 (0.9) | 112 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 20 | FT% [numeric] | Mean (sd) : 0.7 (0.2) min < med < max: 0 < 0.8 < 1 IQR (CV) : 0.2 (0.2) | 582 distinct values | 456 (4.77%) | |||||||||||||||||||||||||||||||||||||||||||||
| 21 | ORB [numeric] | Mean (sd) : 0.9 (0.8) min < med < max: 0 < 0.6 < 6 IQR (CV) : 0.9 (0.9) | 54 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 22 | DRB [numeric] | Mean (sd) : 2.5 (1.8) min < med < max: 0 < 2.1 < 12 IQR (CV) : 2 (0.7) | 111 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 23 | TRB [numeric] | Mean (sd) : 3.4 (2.4) min < med < max: 0 < 2.8 < 18 IQR (CV) : 2.8 (0.7) | 148 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 24 | AST [numeric] | Mean (sd) : 1.7 (1.7) min < med < max: 0 < 1.2 < 12.8 IQR (CV) : 1.8 (1) | 114 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 25 | STL [numeric] | Mean (sd) : 0.6 (0.4) min < med < max: 0 < 0.5 < 2.9 IQR (CV) : 0.5 (0.7) | 30 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 26 | BLK [numeric] | Mean (sd) : 0.4 (0.5) min < med < max: 0 < 0.2 < 6 IQR (CV) : 0.4 (1.2) | 39 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 27 | TOV [numeric] | Mean (sd) : 1.1 (0.8) min < med < max: 0 < 1 < 5.7 IQR (CV) : 0.9 (0.7) | 51 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 28 | PF [numeric] | Mean (sd) : 1.8 (0.8) min < med < max: 0 < 1.8 < 6 IQR (CV) : 1.1 (0.5) | 46 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 29 | PTS [numeric] | Mean (sd) : 7.8 (5.8) min < med < max: 0 < 6.4 < 36.1 IQR (CV) : 7.7 (0.7) | 301 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 30 | PER [numeric] | Mean (sd) : 12.6 (6) min < med < max: -54.4 < 12.5 < 133.8 IQR (CV) : 5.9 (0.5) | 412 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 31 | TS% [numeric] | Mean (sd) : 0.5 (0.1) min < med < max: 0 < 0.5 < 1.5 IQR (CV) : 0.1 (0.2) | 481 distinct values | 25 (0.26%) | |||||||||||||||||||||||||||||||||||||||||||||
| 32 | 3PAr [numeric] | Mean (sd) : 0.2 (0.2) min < med < max: 0 < 0.2 < 1 IQR (CV) : 0.4 (0.9) | 784 distinct values | 26 (0.27%) | |||||||||||||||||||||||||||||||||||||||||||||
| 33 | FTr [numeric] | Mean (sd) : 0.3 (0.2) min < med < max: 0 < 0.3 < 6 IQR (CV) : 0.2 (0.7) | 778 distinct values | 26 (0.27%) | |||||||||||||||||||||||||||||||||||||||||||||
| 34 | ORB% [numeric] | Mean (sd) : 5.5 (4.7) min < med < max: 0 < 4 < 100 IQR (CV) : 6.2 (0.9) | 222 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 35 | DRB% [numeric] | Mean (sd) : 14.5 (6.5) min < med < max: 0 < 13.4 < 100 IQR (CV) : 8.5 (0.4) | 354 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 36 | TRB% [numeric] | Mean (sd) : 10 (5) min < med < max: 0 < 8.9 < 86.4 IQR (CV) : 7.2 (0.5) | 265 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 37 | AST% [numeric] | Mean (sd) : 12.7 (9.2) min < med < max: 0 < 9.8 < 78.5 IQR (CV) : 11 (0.7) | 470 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 38 | STL% [numeric] | Mean (sd) : 1.6 (0.9) min < med < max: 0 < 1.5 < 12.5 IQR (CV) : 0.8 (0.5) | 80 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 39 | BLK% [numeric] | Mean (sd) : 1.6 (1.7) min < med < max: 0 < 1 < 26.3 IQR (CV) : 1.8 (1.1) | 109 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 40 | TOV% [numeric] | Mean (sd) : 14 (6.1) min < med < max: 0 < 13.2 < 100 IQR (CV) : 5.9 (0.4) | 341 distinct values | 21 (0.22%) | |||||||||||||||||||||||||||||||||||||||||||||
| 41 | USG% [numeric] | Mean (sd) : 18.6 (5.2) min < med < max: 0 < 18.2 < 53.7 IQR (CV) : 6.8 (0.3) | 334 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 42 | OWS [numeric] | Mean (sd) : 1.2 (2) min < med < max: -3.3 < 0.5 < 14.8 IQR (CV) : 1.9 (1.6) | 156 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 43 | DWS [numeric] | Mean (sd) : 1.2 (1.1) min < med < max: -0.6 < 0.9 < 9.1 IQR (CV) : 1.4 (1) | 80 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 44 | WS [numeric] | Mean (sd) : 2.4 (2.8) min < med < max: -2.1 < 1.5 < 20.3 IQR (CV) : 3.5 (1.2) | 184 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 45 | WS/48 [numeric] | Mean (sd) : 0.1 (0.1) min < med < max: -1.3 < 0.1 < 2.7 IQR (CV) : 0.1 (1.4) | 557 distinct values | 3 (0.03%) | |||||||||||||||||||||||||||||||||||||||||||||
| 46 | OBPM [numeric] | Mean (sd) : -1.7 (3.5) min < med < max: -46.4 < -1.5 < 68.6 IQR (CV) : 3.4 (-2.1) | 283 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 47 | DBPM [numeric] | Mean (sd) : -0.5 (2.1) min < med < max: -23.1 < -0.5 < 17.1 IQR (CV) : 2.5 (-4.4) | 185 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 48 | BPM [numeric] | Mean (sd) : -2.1 (4.2) min < med < max: -59 < -1.8 < 54.4 IQR (CV) : 4.2 (-2) | 334 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 49 | VORP [numeric] | Mean (sd) : 0.5 (1.3) min < med < max: -2.2 < 0 < 12.4 IQR (CV) : 1.1 (2.4) | 112 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 50 | year [integer] | Mean (sd) : 2011.7 (4.7) min < med < max: 2004 < 2012 < 2019 IQR (CV) : 8 (0) | 16 distinct values | 0 (0%) |
For Players_bio data, we join players’ data and biographical data and turn the variables into numerics and characters according to their content.
Scroll down the table to see more details
print(dfSummary(players_bio,headings = FALSE,plain.ascii = FALSE,valid.col = FALSE,graph.magnif = 0.75,style = "grid"),max.tbl.height = 500,method='render')
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Rk [numeric] | Mean (sd) : 239.6 (137.8) min < med < max: 1 < 239 < 540 IQR (CV) : 238 (0.6) | 540 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 2 | Player [character] | 1. Mike James 2. Mike Dunleavy 3. Chris Johnson 4. David Lee 5. Corey Brewer 6. Kyle Korver 7. Andre Miller 8. Devin Harris 9. Nazr Mohammed 10. Trevor Ariza [ 1636 others ] |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 3 | Pos [character] | 1. SG 2. PF 3. PG 4. C 5. SF 6. C-PF 7. PG-SG 8. SF-SG 9. PF-SF 10. SG-SF [ 5 others ] |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 4 | Age [numeric] | Mean (sd) : 26.6 (4.2) min < med < max: 18 < 26 < 44 IQR (CV) : 7 (0.2) | 26 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 5 | Tm [character] | 1. TOT 2. HOU 3. CLE 4. NYK 5. MEM 6. PHI 7. LAC 8. MIL 9. WAS 10. DAL [ 26 others ] |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 6 | G [numeric] | Mean (sd) : 46.6 (26.3) min < med < max: 1 < 51 < 85 IQR (CV) : 49 (0.6) | 85 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 7 | GS [numeric] | Mean (sd) : 21.9 (27.4) min < med < max: 0 < 7 < 83 IQR (CV) : 40 (1.2) | 84 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 8 | MP [numeric] | Mean (sd) : 1078 (877.4) min < med < max: 0 < 887 < 3424 IQR (CV) : 1483 (0.8) | 2828 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 9 | FG [numeric] | Mean (sd) : 166.1 (164.4) min < med < max: 0 < 114 < 978 IQR (CV) : 225 (1) | 727 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 10 | FGA [numeric] | Mean (sd) : 366.9 (352.8) min < med < max: 0 < 260 < 2173 IQR (CV) : 489 (1) | 1370 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 11 | FG% [numeric] | Mean (sd) : 0.4 (0.1) min < med < max: 0 < 0.4 < 1 IQR (CV) : 0.1 (0.2) | 458 distinct values | 53 (0.55%) | |||||||||||||||||||||||||||||||||||||||||||||
| 12 | 3P [numeric] | Mean (sd) : 33.1 (46.4) min < med < max: 0 < 10 < 402 IQR (CV) : 52 (1.4) | 249 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 13 | 3PA [numeric] | Mean (sd) : 93 (122.9) min < med < max: 0 < 34 < 1028 IQR (CV) : 146 (1.3) | 552 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 14 | 3P% [numeric] | Mean (sd) : 0.3 (0.2) min < med < max: 0 < 0.3 < 1 IQR (CV) : 0.2 (0.6) | 380 distinct values | 1490 (15.33%) | |||||||||||||||||||||||||||||||||||||||||||||
| 15 | 2P [numeric] | Mean (sd) : 133 (140.6) min < med < max: 0 < 85 < 798 IQR (CV) : 174 (1.1) | 644 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 16 | 2PA [numeric] | Mean (sd) : 273.9 (280.7) min < med < max: 0 < 182 < 1655 IQR (CV) : 350 (1) | 1140 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 17 | 2P% [numeric] | Mean (sd) : 0.5 (0.1) min < med < max: 0 < 0.5 < 1 IQR (CV) : 0.1 (0.2) | 446 distinct values | 98 (1.01%) | |||||||||||||||||||||||||||||||||||||||||||||
| 18 | eFG% [numeric] | Mean (sd) : 0.5 (0.1) min < med < max: 0 < 0.5 < 1.5 IQR (CV) : 0.1 (0.2) | 473 distinct values | 53 (0.55%) | |||||||||||||||||||||||||||||||||||||||||||||
| 19 | FT [numeric] | Mean (sd) : 80.1 (98.8) min < med < max: 0 < 44 < 756 IQR (CV) : 99 (1.2) | 515 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 20 | FTA [numeric] | Mean (sd) : 105.7 (125.1) min < med < max: 0 < 61 < 916 IQR (CV) : 129 (1.2) | 615 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 21 | FT% [numeric] | Mean (sd) : 0.7 (0.2) min < med < max: 0 < 0.8 < 1 IQR (CV) : 0.2 (0.2) | 582 distinct values | 475 (4.89%) | |||||||||||||||||||||||||||||||||||||||||||||
| 22 | ORB [numeric] | Mean (sd) : 48.4 (57.1) min < med < max: 0 < 28 < 440 IQR (CV) : 56 (1.2) | 310 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 23 | DRB [numeric] | Mean (sd) : 139.5 (137.5) min < med < max: 0 < 102 < 894 IQR (CV) : 174 (1) | 650 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 24 | TRB [numeric] | Mean (sd) : 187.8 (188.3) min < med < max: 0 < 133 < 1247 IQR (CV) : 228 (1) | 837 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 25 | AST [numeric] | Mean (sd) : 97.4 (123.3) min < med < max: 0 < 53 < 925 IQR (CV) : 117 (1.3) | 610 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 26 | STL [numeric] | Mean (sd) : 33.6 (32.5) min < med < max: 0 < 24 < 217 IQR (CV) : 44 (1) | 179 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 27 | BLK [numeric] | Mean (sd) : 21.2 (30.4) min < med < max: 0 < 10 < 307 IQR (CV) : 23 (1.4) | 208 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 28 | TOV [numeric] | Mean (sd) : 61.4 (59.5) min < med < max: 0 < 44 < 464 IQR (CV) : 78 (1) | 304 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 29 | PF [numeric] | Mean (sd) : 93.3 (71.2) min < med < max: 0 < 83 < 332 IQR (CV) : 117 (0.8) | 304 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 30 | PTS [numeric] | Mean (sd) : 445.5 (448.9) min < med < max: 0 < 303 < 2832 IQR (CV) : 600 (1) | 1656 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 31 | Year [numeric] | Mean (sd) : 2011.7 (4.7) min < med < max: 2004 < 2012 < 2019 IQR (CV) : 8 (0) | 16 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 32 | year_start [integer] | Mean (sd) : 2006.5 (6.7) min < med < max: 1952 < 2007 < 2018 IQR (CV) : 10 (0) | 44 distinct values | 275 (2.83%) | |||||||||||||||||||||||||||||||||||||||||||||
| 33 | year_end [integer] | Mean (sd) : 2014.3 (5) min < med < max: 1958 < 2016 < 2018 IQR (CV) : 6 (0) | 31 distinct values | 275 (2.83%) | |||||||||||||||||||||||||||||||||||||||||||||
| 34 | position [character] | 1. C 2. C-F 3. F 4. F-C 5. F-G 6. G 7. G-F |
|
275 (2.83%) | |||||||||||||||||||||||||||||||||||||||||||||
| 35 | height [character] | 1. 6-9 2. 6-7 3. 6-10 4. 6-8 5. 6-6 6. 6-11 7. 6-3 8. 6-5 9. 7-0 10. 6-4 [ 12 others ] |
|
275 (2.83%) | |||||||||||||||||||||||||||||||||||||||||||||
| 36 | weight [integer] | Mean (sd) : 219.8 (26.9) min < med < max: 135 < 220 < 360 IQR (CV) : 40 (0.1) | 120 distinct values | 275 (2.83%) | |||||||||||||||||||||||||||||||||||||||||||||
| 37 | birth_date [character] | 1. June 26, 1984 2. June 1, 1985 3. March 25, 1986 4. May 19, 1976 5. August 17, 1986 6. March 5, 1986 7. December 2, 1978 8. September 28, 1982 9. October 26, 1985 10. April 1, 1988 [ 1406 others ] |
|
275 (2.83%) | |||||||||||||||||||||||||||||||||||||||||||||
| 38 | college [character] | 1. 2. University of Kentucky 3. Duke University 4. University of North Carol 5. University of California, 6. University of Kansas 7. University of Arizona 8. University of Connecticut 9. University of Florida 10. University of Texas at Au [ 224 others ] |
|
275 (2.83%) |
For teams’ data, we split them into two datasets Team_split and Team_shooting.
Teams_splits contains all the ‘per game’ stats for each 30 team every season. We choose ‘Location’ filter because all the teams have to play 41 Home game and 41 Road games every year and we simply calculate the mean to get seasonal average stats. We changed the format, removed the ranking variables, combined the basic with advanced data, and put all 15 years data into this one dataset.
Scroll down the table to see more details
print(dfSummary(Team_splits,headings = FALSE,plain.ascii = FALSE,valid.col = FALSE,graph.magnif = 0.75,style = "grid"),max.tbl.height = 500,method='render')
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | team [character] | 1. Atlanta Hawks 2. Boston Celtics 3. Chicago Bulls 4. Cleveland Cavaliers 5. Dallas Mavericks 6. Denver Nuggets 7. Detroit Pistons 8. Golden State Warriors 9. Houston Rockets 10. Indiana Pacers [ 26 others ] |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 2 | pctWins [numeric] | Mean (sd) : 0.5 (0.2) min < med < max: 0.1 < 0.5 < 0.9 IQR (CV) : 0.2 (0.3) | 115 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 3 | fgm [numeric] | Mean (sd) : 37.5 (2.1) min < med < max: 32.4 < 37.3 < 44 IQR (CV) : 2.7 (0.1) | 168 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 4 | fga [numeric] | Mean (sd) : 82.5 (3.6) min < med < max: 74.2 < 82.2 < 94 IQR (CV) : 5.1 (0) | 220 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 5 | pctFG [numeric] | Mean (sd) : 0.5 (0) min < med < max: 0.4 < 0.5 < 0.5 IQR (CV) : 0 (0) | 125 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 6 | fg3m [numeric] | Mean (sd) : 7.4 (2.3) min < med < max: 2.8 < 7 < 16.1 IQR (CV) : 3 (0.3) | 164 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 7 | fg3a [numeric] | Mean (sd) : 20.7 (6.1) min < med < max: 8.2 < 19.5 < 45.3 IQR (CV) : 8.3 (0.3) | 294 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 8 | pctFG3 [numeric] | Mean (sd) : 0.4 (0) min < med < max: 0.3 < 0.4 < 0.4 IQR (CV) : 0 (0.1) | 478 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 9 | pctFT [numeric] | Mean (sd) : 0.8 (0) min < med < max: 0.7 < 0.8 < 0.8 IQR (CV) : 0 (0) | 206 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 10 | fg2m [numeric] | Mean (sd) : 30.1 (1.9) min < med < max: 23.1 < 30.2 < 35.2 IQR (CV) : 2.4 (0.1) | 151 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 11 | fg2a [numeric] | Mean (sd) : 61.8 (4.6) min < med < max: 41.9 < 62.1 < 74.3 IQR (CV) : 6.1 (0.1) | 253 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 12 | pctFG2 [numeric] | Mean (sd) : 0.5 (0) min < med < max: 0.4 < 0.5 < 0.6 IQR (CV) : 0 (0) | 479 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 13 | ftm [numeric] | Mean (sd) : 18.2 (2) min < med < max: 12.2 < 18.1 < 24.1 IQR (CV) : 2.6 (0.1) | 153 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 14 | fta [numeric] | Mean (sd) : 24 (2.6) min < med < max: 16.6 < 23.9 < 31.6 IQR (CV) : 3.3 (0.1) | 196 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 15 | oreb [numeric] | Mean (sd) : 11 (1.3) min < med < max: 7.6 < 10.9 < 14.6 IQR (CV) : 1.7 (0.1) | 113 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 16 | dreb [numeric] | Mean (sd) : 31.5 (2.1) min < med < max: 26.9 < 31.2 < 40.5 IQR (CV) : 3 (0.1) | 159 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 17 | treb [numeric] | Mean (sd) : 42.4 (2) min < med < max: 36.8 < 42.2 < 49.7 IQR (CV) : 2.7 (0) | 154 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 18 | ast [numeric] | Mean (sd) : 21.9 (2) min < med < max: 17.4 < 21.6 < 30.4 IQR (CV) : 2.6 (0.1) | 157 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 19 | tov [numeric] | Mean (sd) : 14.4 (1.1) min < med < max: 11.2 < 14.4 < 17.7 IQR (CV) : 1.4 (0.1) | 106 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 20 | stl [numeric] | Mean (sd) : 7.5 (0.9) min < med < max: 5.5 < 7.5 < 10 IQR (CV) : 1.1 (0.1) | 81 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 21 | blk [numeric] | Mean (sd) : 4.9 (0.8) min < med < max: 2.4 < 4.8 < 8.2 IQR (CV) : 1 (0.2) | 78 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 22 | blka [numeric] | Mean (sd) : 4.9 (0.7) min < med < max: 3 < 4.9 < 6.9 IQR (CV) : 0.9 (0.1) | 71 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 23 | pf [numeric] | Mean (sd) : 20.9 (1.7) min < med < max: 16.6 < 20.8 < 26.7 IQR (CV) : 2.4 (0.1) | 137 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 24 | pts [numeric] | Mean (sd) : 100.5 (5.9) min < med < max: 85.5 < 99.7 < 118.2 IQR (CV) : 7.6 (0.1) | 296 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 25 | pfd [numeric] | Mean (sd) : 19.5 (5.1) min < med < max: 0 < 20.4 < 25.6 IQR (CV) : 2.2 (0.3) | 119 distinct values | 32 (6.68%) | |||||||||||||||||||||||||||||||||||||||||||||
| 26 | pctAST [numeric] | Mean (sd) : 0.6 (0) min < med < max: 0.5 < 0.6 < 0.7 IQR (CV) : 0.1 (0.1) | 237 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 27 | pctOREB [numeric] | Mean (sd) : 0.3 (0) min < med < max: 0.2 < 0.3 < 0.4 IQR (CV) : 0 (0.1) | 191 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 28 | pctDREB [numeric] | Mean (sd) : 0.7 (0) min < med < max: 0.7 < 0.7 < 0.8 IQR (CV) : 0 (0) | 174 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 29 | pctTREB [numeric] | Mean (sd) : 0.5 (0) min < med < max: 0.5 < 0.5 < 0.5 IQR (CV) : 0 (0) | 119 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 30 | pctTOVTeam [numeric] | Mean (sd) : 0.2 (0) min < med < max: 0.1 < 0.2 < 0.2 IQR (CV) : 0 (0.1) | 112 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 31 | pctEFG [numeric] | Mean (sd) : 0.5 (0) min < med < max: 0.4 < 0.5 < 0.6 IQR (CV) : 0 (0) | 172 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 32 | pctTS [numeric] | Mean (sd) : 0.5 (0) min < med < max: 0.5 < 0.5 < 0.6 IQR (CV) : 0 (0) | 151 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 33 | ortgE [numeric] | Mean (sd) : 104.1 (3.7) min < med < max: 92.3 < 103.9 < 113.9 IQR (CV) : 5.4 (0) | 227 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 34 | ortg [numeric] | Mean (sd) : 105.7 (3.7) min < med < max: 94.4 < 105.3 < 114.9 IQR (CV) : 5.1 (0) | 232 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 35 | drtgE [numeric] | Mean (sd) : 104.1 (3.6) min < med < max: 91.6 < 104.2 < 115.1 IQR (CV) : 5.1 (0) | 229 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 36 | drtg [numeric] | Mean (sd) : 105.7 (3.5) min < med < max: 93.1 < 105.8 < 116.8 IQR (CV) : 4.9 (0) | 223 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 37 | netrtgE [numeric] | Mean (sd) : 0 (5) min < med < max: -15.5 < 0 < 12.1 IQR (CV) : 7 (672.1) | 274 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 38 | netrtg [numeric] | Mean (sd) : 0 (4.7) min < med < max: -15.1 < 0.1 < 11.4 IQR (CV) : 6.8 (420.7) | 269 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 39 | ratioASTtoTO [numeric] | Mean (sd) : 1.5 (0.2) min < med < max: 1 < 1.5 < 2.1 IQR (CV) : 0.3 (0.1) | 151 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 40 | ratioAST [numeric] | Mean (sd) : 16.8 (1.2) min < med < max: 14.1 < 16.7 < 21.2 IQR (CV) : 1.5 (0.1) | 106 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 41 | paceE [numeric] | Mean (sd) : 95.7 (3.5) min < med < max: 88.6 < 95.3 < 106.5 IQR (CV) : 4.9 (0) | 227 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 42 | pace [numeric] | Mean (sd) : 94.3 (3.4) min < med < max: 87.4 < 93.9 < 104.6 IQR (CV) : 4.8 (0) | 432 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 43 | ratioPIE [numeric] | Mean (sd) : 0.5 (0) min < med < max: 0.4 < 0.5 < 0.6 IQR (CV) : 0 (0.1) | 211 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 44 | year [integer] | Mean (sd) : 2011.5 (4.6) min < med < max: 2004 < 2012 < 2019 IQR (CV) : 7.5 (0) | 16 distinct values | 0 (0%) |
Team_shooting contains all the shooting performance of each team from different regions on the court. We cleaned them the same way as Team_splits
Scroll down the table to see more details
print(dfSummary(Team_shooting,headings = FALSE,plain.ascii = FALSE,valid.col = FALSE,graph.magnif = 0.75,style = "grid"),max.tbl.height = 500,method='render')
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | team [character] | 1. Atlanta Hawks 2. Boston Celtics 3. Chicago Bulls 4. Cleveland Cavaliers 5. Dallas Mavericks 6. Denver Nuggets 7. Detroit Pistons 8. Golden State Warriors 9. Houston Rockets 10. Indiana Pacers [ 26 others ] |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 2 | distance [character] | 1. 16-24 ft. 2. 24+ ft. 3. 8-16 ft. 4. Back Court Shot 5. Less Than 8 ft. |
|
0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 3 | fgm [numeric] | Mean (sd) : 607.1 (528.8) min < med < max: 0 < 474 < 2259 IQR (CV) : 467.5 (0.9) | 939 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 4 | fga [numeric] | Mean (sd) : 1335.9 (949.2) min < med < max: 3 < 1230 < 3891 IQR (CV) : 1225 (0.7) | 1309 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 5 | pctFG [numeric] | Mean (sd) : 0.3 (0.2) min < med < max: 0 < 0.4 < 0.6 IQR (CV) : 0.1 (0.5) | 293 distinct values | 0 (0%) | |||||||||||||||||||||||||||||||||||||||||||||
| 6 | year [integer] | Mean (sd) : 2011.5 (4.6) min < med < max: 2004 < 2012 < 2019 IQR (CV) : 8 (0) | 16 distinct values | 0 (0%) |
As we can see in the aforementioned tables, there is no missing value in Teams_splits and Team_shooting. Also, since Player and Players_bio are similar to each other, we are going to display the missing values of Players_bio here.
visna(players_bio)
Figure 1: Missing values
Figure 1 shows that the marjority of the data has no missing values.
Those lines that have missed yearstart variable also missed all the following variables. This is because these columns come from another table: bio. Although the bio table itself has no missing values, it does not contain all the players as Player data has.
Also, we can see that there are quite some rows missing 3PA values, FT values etc. These varibales are related to player’s shooting data per season. The missing values mean that these players do not shoot that season.
Team_splits %>% select(year, pts, pace) %>% group_by(year) %>% summarise(Pace = mean(pace), Points = mean(pts)) %>%
gather(key = 'type', value = 'value', -year) %>%
ggplot(aes(x = year, y = value)) +
geom_line(color = '#C9082A', size = 2) +
geom_point(color = '#C9082A', size = 4) +
geom_point(color = '#FFFFFF', size = 2) +
#scale_color_manual(values = c('#17408B', '#C9082A')) +
facet_grid(type ~ ., scales = 'free_y') +
scale_x_continuous(labels = unique(Team_splits$year), breaks = unique(Team_splits$year)) +
xlab('') +
ylab('') +
ggtitle('Pace and Points Per Game') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5),
strip.text.y = element_text(size = 10, colour = '#FFFFFF', face = 'bold'),
strip.background = element_rect(fill = '#17408B', colour = 'white'),
plot.title = element_text(size = 17.5, face = 'bold'),
legend.position = 'none')
There ’s an obvious trend in both Pace (the number of possessions a team uses per game) and PPG (Points Per Game) of NBA games in recent 15 years. We can see that from 2004 to 2013 the pace and PPG are fluctuating around 93 and 98 respectively, but from 2014 these two stats keep growing and especially in 2019 the pace rise to 101 from 98 last year and PPG increases by nearly 6 points more than last season. It’s easy to find a positive associaiton between pace and PPG since the more possessions you have the more chances you can score.
Team_splits %>% select(year, ortg, pctWins) %>% group_by(year) %>%
ggplot(aes(x = year, y=ortg, alpha = pctWins, color = 1-pctWins)) +
geom_jitter(size = 2) +
geom_smooth(linetype = 'longdash', colour = '#C9082A', se = FALSE, size = 2) +
scale_x_continuous(labels = unique(Team_splits$year), breaks = unique(Team_splits$year)) +
ggtitle('Average Offensive Rating Per Game') +
xlab('') +
ylab('') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5),
plot.title = element_text(size = 17.5, face = 'bold'),
legend.position = 'none')
This plot shows average offrtg (offensive rating, a statistic used to measure a team’s offensive performance) of each teams in these 15 years. The color reflects the Win Percentage of each team. The darker the marker is, the more the team wins. Offensive Rating shows that the offensive ability of each team started growing from 2013 and reached an unprecedented level in 2018. We are curious about is there any other reasons for such high offensive performance these years except the high pace?
p1 <- Team_splits %>% select(year, fg3a, fg2a) %>%
gather(key = 'type', value = 'attempt', -c(year)) %>%
group_by(year, type) %>% summarise(attempt = mean(attempt)) %>%
ggplot(aes(x = year, y = attempt, group = year)) +
#geom_boxplot(aes(color = type)) +
#geom_line() +
geom_bar(stat = 'identity', fill = '#C9082A') +
facet_grid(type ~., scales = 'free_y', labeller = as_labeller(c(`fg2a` = '2 pointer', `fg3a` = '3 pointer'))) +
scale_color_manual(values = c('#17408B', '#C9082A')) +
scale_x_continuous(labels = unique(Team_splits$year), breaks = unique(Team_splits$year)) +
xlab('') +
ylab('') +
#ylim(0, 2500) +
ggtitle('Field Goals Attempt') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5),
strip.text.y = element_text(size = 10, colour = '#FFFFFF', face = 'bold'),
strip.background = element_rect(fill = '#17408B', colour = 'white'),
plot.title = element_text(size = 17.5, face = 'bold'),
legend.position = 'none')
p2 <- Team_splits %>% select(year, pctFG3, pctFG2) %>%
gather(key = 'type', value = 'percentage', -c(year)) %>%
group_by(year, type) %>% summarise(percentage = mean(percentage)) %>%
ggplot(aes(x = year, y = percentage)) +
geom_line(color = '#C9082A', size = 2) +
geom_point(color = '#C9082A', size = 4) +
geom_point(color = '#FFFFFF', size = 2) +
facet_grid(type ~., scales = 'free_y', labeller = as_labeller(c(`pctFG2` = '2 pointer', `pctFG3` = '3 pointer'))) +
scale_x_continuous(labels = unique(Team_splits$year), breaks = unique(Team_splits$year)) +
xlab('') +
ylab('') +
ggtitle('Field Goals Percentage') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5),
strip.text.y = element_text(size = 10, colour = '#FFFFFF', face = 'bold'),
strip.background = element_rect(fill = '#17408B', colour = 'white'),
plot.title = element_text(size = 17.5, face = 'bold'))
grid.arrange(p1, p2, ncol = 2)
In basketball, a field goal is a basket scored on any shot or tap other than a free throw, worth two or three points depending on the distance of the attempt from the basket.
This plot shows the FGA (Field Goal Attempt) and FG% (Field Goal Percentage) for both 2 pointer and 3 pointer of the league average performance.
In the left plot, we find that teams in NBA is attempting more and more 3 pointers year by year without decreasing too much 2 pointer attempts. In 2019, FGA for 3 is more than twice of that 15 years ago. Also in 2019, FGA for 3 is beyond 30 and FGA for 2 is below 60, which means in average every three shots in a NBA game ther is one 3 pointer shot in 2019.
The right plot tells the FG% of 2 pointer and 3 pointer from 2004 to 2019. It’s clear that the FG% for 2 keeps growing from 2012 and reached beyond 50% since 2017. The FG% for 3 is fluctuating between 35% and 36% in most years. We can see that teams are trying to make 2 pointers shots more efficient by increasing the FG% of it.
From these two plots, we can see that the strategy of NBA teams to score more is to try more 3 pointers and keep 2 pointers shots more efficient.
Team_shooting$distance <- factor(Team_shooting$distance, levels = unique(Team_shooting$distance))
p1 <- Team_shooting %>% filter(distance != 'Back Court Shot') %>% select(distance, fga, year) %>% group_by(year, distance) %>% summarise_all(mean) %>%
ggplot(aes(x = year, y = fga/82, group = year)) +
#geom_boxplot() +
geom_bar(stat = 'identity', fill = '#C9082A') +
facet_grid(distance ~ ., scales = 'free_y') +
scale_x_continuous(labels = unique(Team_splits$year), breaks = unique(Team_splits$year)) +
xlab('') +
ylab('') +
#ylim(0,1500) +
ggtitle('Field Goals Attempt by Distance') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5),
legend.position = 'none',
strip.text.y = element_text(size = 10, colour = '#FFFFFF', face = 'bold'),
strip.background = element_rect(fill = '#17408B', colour = 'white'),
plot.title = element_text(size = 17.5, face = 'bold'))
p2 <- Team_shooting %>% filter(distance != 'Back Court Shot') %>% select(distance, pctFG, year) %>% group_by(year, distance) %>% summarise_all(mean) %>%
ggplot(aes(x = year, y = pctFG)) +
geom_line(color = '#C9082A', size = 2) +
geom_point(color = '#C9082A', size = 4) +
geom_point(color = '#FFFFFF', size = 2) +
facet_grid(distance ~ ., scales = 'free_y') +
scale_x_continuous(labels = unique(Team_splits$year), breaks = unique(Team_splits$year)) +
xlab('') +
ylab('') +
ggtitle('Field Goals Percentage by Distance') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5),
legend.position = 'none',
strip.text.y = element_text(size = 10, colour = '#FFFFFF', face = 'bold'),
strip.background = element_rect(fill = '#17408B', colour = 'white'),
plot.title = element_text(size = 17.5, face = 'bold'))
grid.arrange(p1, p2, ncol = 2)
This plot shows FGA and FG% of shots from different region on the court. The distance is how far the shooting spot is from the basket. Shots beyond 23 feet 9 inches from the basket is 3 pointers and others are 2 pointers. The 24+ ft data are similar with that of the 3 pointer in the plot above.
This plot decompose 2 pointer shots into 3 types – ‘near basket’, ‘mid-range’, ‘long-range’.
We can see from the left plot that ‘near basket’ 2 pointers’ FGA is the most among all and it reaches 30 in 2019 which is even more than the sum of other two types. While ‘long-range’ shots keeps going down and ‘mid-range’ remains around 12. Considering the difficulty of making a field goal rises with the distance from the basket, ‘long-range’ shots seems to be less valuable than ‘near basket’ ones. In the right plot, we can see ‘near basket’ shots’ FG% goes far beyond others and reached 58% in 2019 while ‘mid-range’ shots’ FG% also keeps rising.
This may explain how the NBA teams makes it to keeping throwing more 3 pointers and in the meanwhile raise the FG% of 2 pointers. They decrease the attempts to shoot from ‘low efficence’ regions and focus more near the basket.
Team_splits %>% select(year, pctTS, pctWins) %>% group_by(year) %>%
ggplot(aes(x = year, y=pctTS, alpha = pctWins, color = 1-pctWins)) +
geom_jitter(size = 2) +
geom_smooth(linetype = 'longdash', colour = '#C9082A', se = FALSE, size = 2) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5), legend.position = 'none') +
scale_x_continuous(labels = unique(Team_splits$year), breaks = unique(Team_splits$year)) +
ggtitle('Average True Shooting Percentage Per Game') +
xlab('') +
ylab('') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5),
plot.title = element_text(size = 17.5, face = 'bold'),
legend.position = 'none')
TS% (True Shooting Percentage, measures efficiency at shooting the ball) synthesizes field goal percentage, free throw percentage, and three-point field goal percentage instead of take them individually to calculate shooting more accurately. The same as before, the darker the marker is, the more the team wins. It’s easy to find that the curve of TS% shares the simialr shape of that of offrtg curve and teams at present shoots much more efficiently than 15 years ago.
A work by Chao Yin & Zeyu Yang
cy2507@columbia.edu | zy2327@columbia.edu